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In 2002, severe acute respiratory syndrome (SARS)-coronavirus (CoV) appeared as a novel human virus with high similarity to 
bat coronaviruses. However, while SARS-CoV uses the human angiotensin-converting enzyme 2 (ACE2) receptor for cellular 
entry, no coronavirus isolated from bats appears to use ACE2. Here we show that signatures of recurrent positive selection in the 
bat ACE2 gene map almost perfectly to known SARS-CoV interaction surfaces. Our data indicate that ACE2 utilization preceded 


the emergence of SARS-CoV-like viruses from bats. 


C ell-surface receptors often play a key role in defining viral host 
range. New diseases can emerge when existing viruses evolve 
the ability to bind the ortholog of their cell-surface receptor in a 
new species (1, 25, 35). Indeed, the principal genetic component 
defining host range in coronaviruses is the spike protein on the 
surface of the virus and, in particular, its receptor-binding domain 
(RBD) (5, 14). It is believed that the severe acute respiratory syn- 
drome (SARS) epidemic resulted from the zoonotic transmission 
of a coronavirus from bats to humans (15, 18, 32). The central role 
of the RBD in the SARS-coronavirus (CoV) zoonosis was crystal- 
lized in an experiment in which a bat coronavirus became infec- 
tious in primate cells when it was altered to contain the RBD of 
human SARS-CoV (2). 

Bats are thought to have initially infected one or more species 
of small mammals, such as the palm civet (6, 13, 20, 37). One 
theory is that this intermediate host provided a selective environ- 
ment that drove the coronavirus RBD to acquire point mutations 


TABLE 1 Positive selection of bat ACE2 codons 1 to 358 


Model comparison” 


that made it compatible with the human ortholog of its cell-sur- 
face receptor, angiotensin-converting enzyme 2 (ACE2) (19, 21, 
30, 31). However, one key observation has driven the field to favor 
alternate, more complex theories of emergence. The observation 
is that while SARS-CoV and closely related viruses from the civet 
can use ACE2 as a receptor, no bat coronavirus has been shown to 
use bat, human, or any other orthologs of ACE2 (2, 27). Further, 
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svanceodon Mla vs M2a M7 vs M8 Mé8a vs M8 AN/dS value 

model* Ali P value 2AlnL P value 2AlnL P value (% of codons)‘ Residues under positive selection” 

0.4, f61 O27 P< 0.0001 56.5 P<0.0001 52.0 P< 0.0001 4.3 (11) O24", 127", Kal", A354". Mie2*, LY; 
192. N159*, V212.021>",.D2 16", 
E231*, S280, V298, A301, E329 

0.4, {3 X 4 56.5 P< 0.0001 56.4 P< 0.0001 50; P< 0.0001 4.3 (11) O24", 27" ,.Rol*; 54. M82",Lol., 
192. NI59>.V 212", D215*, D21e*, 
E231*, S280, V298*, A301, E329 

1.6, f61 o2./ P< 0.0001 56.3 P<0.0001 52.8 P< 0.0001 4.3 (11) O24", 127", Kal", A354"; Ms2*, L91*, 
192, N159*,.V¥212,.D215",.)216", 
E231*, S280, V298, A301, E329 

1.6, {3 X 4 56:5 P< 0.0001 56.4 P< 0.0001 56.1 P= 0.0001. 4.3 (11) O24" 27". hol, ot M82 yo 


* Tnitial seed value for w (dN/dS) and model of codon frequency (f61 or f3 X 4). 


192, NI59*;, V212",, D213", D216"; 
B231*,, 5280, V 298";A501, E529 


» Twice the difference in the natural logs of the likelihoods (2AInL) of the two models being compared. This value is used in a likelihood ratio test along with the degrees of 
freedom. In all cases (Mla versus M2a, M7 versus M8, and M8a versus M8), a model that allows positive selection is compared to a null model. The P value indicates the confidence 


with which the null model can be rejected. 


“ dN/dS value of the class of codons evolving under positive selection in M8 and the percentage of codons falling in that class. 

“ Residues corresponding to codons assigned to the class with a dN/dS ratio of >1 in M8 (P > 0.90 by naive empirical Bayes [NEB]). Coordinates correspond to the human 
protein, although the human sequence was not used in this analysis. Bat numerical coordinates are identical with the exception of three species with single codon insertions or 
deletions (see alignment in Fig. $1 in the supplemental material). *, P > 0.95; **, P > 0.99. Three additional codons were identified in the analysis of the full-length gene (see Table 


S2 in the supplemental material). 
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Evidence for ACE2 Usage by Bat Coronaviruses 
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FIG 1 Residues under positive selection in bat ACE2 correspond to human ACE2 residues that interact with the SARS-CoV spike. (a) Six residues under positive 
selection (red) in bat ACE2 map to the SARS-CoV-binding surface (orange and red) of human ACE2 (green) and are in direct contact with the SARS-CoV spike 
(gray) in a cocrystal structure (PDB 2AJF) (17). (b) Bat species used in the ACE2 analysis and the amino acids encoded at the six residue positions that directly 
contact the SARS-CoV spike and are evolving under positive selection. Bat polymorphisms have been reported at some of these positions (11), and a human 
polymorphism is found at one of them. (c) Detailed view of the side chains of five of these residues under positive selection (red) in ACE2 (green), along with the 
side chains of cognate contacts in the SARS-CoV spike (light gray). (d) Cocrystal structures have been solved for human ACE2 in complex with the spike proteins 
of both SARS-CoV (17) and NL63-CoV (39). ACE2 residues that mediate contact with each virus are indicated. Residues under positive selection in bat ACE2 are 


indicated in red. 


sequence-based studies of the coronaviruses that have been found 
in bats suggest that their RBDs contain deletions spanning key 
residues required for mediating contact with ACE2 (5, 15, 18, 20). 
These observations necessitated alternate models of SARS-CoV 
emergence, and the currently favored model is one in which a bat 
coronavirus recombined with the coronavirus of a second, un- 
known species to create a novel hybrid virus that can use ACE2 
(20). Discriminating between these two alternate models of viral 
emergence (ACE2 usage preexisted in the bat reservoir versus 
ACE2 usage was acquired outside this reservoir) is important to 
our understanding of the evolutionary events that generated 
SARS-CoV. We tested these two models by looking at the evolu- 
tion of the ACE2 receptor in bats. 

Over long periods of time, coevolutionary dynamics can de- 
velop between viruses and their hosts (24). For example, host pop- 
ulations will experience natural selection for receptor mutations 
that reduce virus interaction affinity, and viruses will, in turn, be 
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selected for mutations that increase affinity with new receptor 
variants. This back-and-forth selection will result in the rapid evo- 
lution of both the host receptor and the virus surface protein. The 
protein evolutionary rate can be analyzed by studying the rates of 
accumulation of nonsynonymous (dN; changing the encoded 
amino acid) and synonymous (dS; silent) mutations in the under- 
lying gene (24, 41). Most genes retain far fewer nonsynonymous 
mutations than synonymous mutations (dN/dS < 1) because 
protein-altering mutations tend to be deleterious (24). However, 
signatures of recurrent positive selection (dN/dS > 1) have been 
shown to accumulate in gene regions corresponding to the phys- 
ical interaction interface between virus and host proteins, and 
specifically in codons corresponding to key residues that modu- 
late these interactions (4, 7, 22, 23, 29). Starting with a data set of 
partial ACE2 sequences from 11 bat species (codons 1 to 358, 
containing the SARS-CoV interaction domain of human ACE2) 
(see Table S1 in the supplemental material) or full-length ACE2 
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FIG 2 Positive selection of residues at the base of a key ACE2 glycan. (a) A linear schematic of the ACE2 protein is shown. Regions of the protein that interact 
with the SARS-CoV spike are indicated in dark gray (17). Residue positions found to be under positive selection in bats are shown with black tick marks. Six of 
these fall in the known surface of interaction with the SARS-CoV spike, and 13 more are indicated with numbers. Of these, five (in red type) are positioned at the 
base of a key glycan on the receptor that is located at position 90. (b) A rotated view of the structure shown in Fig. 1a, with the main SARS-CoV-binding surface 
now at the left. The glycosylated asparagine at position 90 is shown in orange, with five residues under positive selection sitting in a ridge adjacent to it (red). 


sequences available for 8 of these species, DNA alignments were fit 
to different models of codon evolution using the codeml program 
in PAML (40). Some of these models allow certain codons to 
evolve under positive selection (M2a and M8), while others do not 
allow positive selection (Mla, M7, and M8a). We found that mod- 
els of positive selection are highly supported (P < 0.0001) in both 
of these data sets (Table 1; see also Table S2 in the supplemental 
material). In total, 19 codons were assigned a dN/dS ratio greater 
than one with high posterior probability, with the partial gene 
analysis identifying more of these codons because of deeper spe- 
cies representation (Table 1; see also Table S2 in the supplemental 
material). These 19 codons in bat ACE2 have experienced recur- 
rent selection for mutations that replace the encoded amino acid. 
For this reason, these positions are highly variable at the protein 
level (see Fig. S1 in the supplemental material). 

Structures have been solved for human ACE2 (36) and for 
human ACE2 in complex with the SARS-CoV spike protein (17). 
Of the 19 ACE2 codons under positive selection in bats, 17 corre- 
late to residues included in these structures. All 17 of these are 
surface-exposed residues in human ACE2. Six of these correlate to 
residues (Q24, T27, K31, H34, M82, and E329) (colored red in Fig. 
la) that make direct contact with the SARS-CoV spike protein 
(gray structure in Fig. la). These six residues are highly variable 
between and within bat species (Fig. 1b). Five of these residues 
(colored red in Fig. 1c) comprise a single ridge that intimately 
contacts the virus spike (gray). Two of the residues in this ridge 
(K31 and H34) mediate interaction with N479 in the SARS-CoV 
RBD (17, 20), a key position in the virus that acquired critical 
mutations during emergence (16, 20, 21, 26, 30). Species-specific 
differences at four residues in this ridge (residues 27, 31, 34, and 
82) are known to contribute to species specificity of receptor usage 
by SARS-CoV (11, 17). These evolutionary signatures indicate 
that bats have been coevolving with something that is driving 
rapid evolution at this ACE2 interface. The footprints left by this 
interaction track remarkably well with the residues that interact 
with SARS-CoV. 
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Additional lines of evidence suggest that the virus driving this 
evolutionary signature in bat ACE2 is very similar to SARS-CoV. 
First, NL63-CoV is another human coronavirus that interacts 
with the same surface of the ACE2 receptor (8, 9, 38, 39). How- 
ever, the residues under positive selection in bats track specifically 
with SARS-CoV-interacting residues rather than with residues 
shown to mediate interactions with NL63-CoV (Fig. 1d). Second, 
we noticed that some positions under positive selection in bat 
ACE2 (numbered tick marks in Fig. 2a) do not correlate to the 
SARS-CoV-binding surface. However, five of these cluster around 
a key glycosylation site at position 90 of human ACE2 (Fig. 2b). 
Although it sits well outside the central SARS-CoV-binding sur- 
face (shown at left), this glycan has been shown to alter SARS-CoV 
binding (21). Position 90 is conserved as an asparagine in many 
bat species (see Fig. Sl in the supplemental material), and the 
attached glycan (not shown) faces the virus RBD (gray structure in 
Fig. 2b) (17). The residues sitting at its base are perhaps experi- 
encing positive selection for amino acid replacements that alter 
the spatial orientation of this glycan moiety, a process which 
would constitute a novel genetic mechanism for host adaptation. 
Because the evolutionary signatures of positive selection recorded 
in bat ACE2 have accumulated at critical residues in human ACE2 
that are known to govern binding by the SARS-CoV spike, we 
conclude that a virus very similar to SARS-CoV must have left this 
evolutionary footprint on ACE2 in bats. 

These results are consistent with a model in which an ACE2-uti- 
lizing bat coronavirus infected civets and/or other intermediate hosts 
or possibly even transmitted directly to humans. This virus could 
have preexisted in bats or could have been a newly created virus re- 
sulting from recombination between two bat coronaviruses. The data 
do not support the less parsimonious model that ACE2 utilization 
was acquired after transmission of a bat coronavirus to another spe- 
cies. Others have also concluded that phylogenetic incongruencies 
within coronavirus genomes (28, 33, 34) do not necessarily support a 
model of interhost virus recombination during the emergence of 
SARS-CoV but may instead simply reflect differences in evolutionary 
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rates between different coronavirus genes (10). The idea that bats 
have been coevolving with SARS-CoV-like viruses over long periods 
of time is supported by the high SARS-CoV antibody prevalence 
found in bat populations of multiple species isolated from different 
geographic regions in China (18). This evolutionary analysis of ACE2 
sheds light on the history of emergence of this zoonotic virus from bat 
reservoirs. Similar insight was recently gained into the emergence of 
canine parvovirus by analyzing the evolution of its receptor, TfR, in 
carnivore species from which it arose (12). Likewise, based on evolu- 
tionary patterns in the gene encoding the Duffy antigen receptor for 
chemokines (DARC), we recently proposed that simian primates are 
an ancient reservoir for malaria-causing Plasmodium (3). These are 
the first examples demonstrating that evolutionary studies of cellular 
receptors may be broadly useful in understanding disease emergence. 
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